home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1993
/
Internet Info CD-ROM (Walnut Creek) (1993).iso
/
inet
/
internet-drafts
/
draft-ietf-iiir-wais-00.txt
< prev
next >
Wrap
Text File
|
1993-10-26
|
13KB
|
323 lines
IETF IIIR Working Group M. St. Pierre, WAIS Inc
INTERNET--DRAFT J. Fullton, CNIDR
Category: Informational K. Gamiel, CNIDR
November 1993 J. Goldman, Thinking Machines Corp
B. Kahle, WAIS Inc
J. A. Kunze, UC Berkeley
H. Morris, WAIS Inc
F. Schiettecatte, FS Consulting
WAIS over Z39.50-1988
<draft-ietf-iiir-wais-00.txt>
1. Status of this Memo
This memo provides information for the Internet community. This memo
does not specify an IAB standard of any kind. Distribution of this
memo is unlimited.
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted
by other documents at any time. It is not appropriate to use
Internet Drafts as reference material or to cite them other than
as a "working draft" or "work in progress."
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any
other Internet Draft.
This Internet Draft expires May 1, 1994.
2. Introduction
The network publishing system, Wide Area Information Servers (WAIS), is
designed to help users find information over a computer network. The
principles guiding WAIS development are:
1. A wide-area networked-based information system for searching,
browsing, and publishing.
2. Based on standards.
3. Easy to use.
4. Flexible and growth oriented.
From this basis, a large group of developers, publishers, standards
bodies, libraries, government agencies, schools, and users have been
helping further the WAIS system.
The WAIS software architecture has four main components: the client,
the server, the database, and the protocol. The WAIS client is a
user-interface program that sends requests for information to local or
remote servers. Clients are available for most popular desktop
environments. The WAIS server is a program that services client
requests, and is available on a variety of UNIX platforms. The server
generally runs on a machine containing one or more information sources,
or WAIS databases. The protocol, Z39.50-1988, is used to connect WAIS
clients and servers and is based on the 1988 Version of the NISO Z39.50
Information Retrieval Service and Protocol Standard. The goal of the
WAIS network publishing system is to create an open architecture of
information clients and servers by using a standard
computer-to-computer protocol that enables clients to communicate with
servers.
WAIS development began in October 1989 with the first Internet release
occurring in April 1991. From the beginning, WAIS committed to use the
Z39.50-1988 standard as the information retrieval protocol between WAIS
clients and servers. The implementation is still in use today by
existing WAIS clients and servers resulting in over 50,000 users of
Z39.50-1988 on the Internet.
3. Purpose
The purpose of this memo is to initiate a discussion for a migration
path of the WAIS technology from Z39.50-1988 Information Retrieval
Service Definitions and Protocol Specification for Library Applications
[1] to Z39.50-1992 [2] and then to Z39.50-1994 [3]. The purpose of
this memo is not to provide a detailed implementation specification,
but rather to describe the high-level design goals and functional
assumptions made in the WAIS implementation of Z39.50-1988. WAIS use
of Z39.50-1992 and Z39.50-1994 standards will be the subject of future
RFCs.
4. Historical Design Goals of WAIS
As an aid to understanding the original WAIS implementation and its use
of Z39.50-1988, the historical design goals of WAIS are presented in
this section. Included with each goal is a brief description of the
assumptions used to meet these design goals.
1. Provide users access to bibliographic and non-bibliographic
information, including full-text and images.
Because Z39.50-1988 grew out of the bibliographic community, additional
assumptions with the protocol were required to serve non-bibliographic
information. They were also necessary to serve documents existing in
multiple formats (e.g., rtf, postscript, gif, etc).
2. Keep the client/server interface simple and independent of changes
in the functionality of the server.
To achieve this, the text string entered by the user was transmitted to
the server without parsing the string into a Type-1 RPN (reverse-polish
notation) query, as is common for bibliographic applications. Instead
WAIS defined a new Type-3 query containing the text string. In this
way, knowledge of the Z39.50 Attributes supported by the server was no
longer required by the client or the user, as is true of many existing
Z39.50 implementations. In addition, the client software did not
require modification to support the evolving functionality of the
server.
3. Provide relevance feedback capability.
Relevance feedback is the ability to select a document, or portion of a
document, and find a set of documents similar to the selection. WAIS
included documents used in relevance feedback as part of the Type-3
query.
4. Permit the server to operate in a stateless manner.
A WAIS server was designed to be "stateless", meaning that search
result sets were not stored by the server. In Z39.50 terms, the server
exercised its right to unilaterally delete a result set as soon as it
sent the search response. For this reason, the Present Facility of
Z39.50 was not used, and retrievals were performed using the Search
Facility. Relaxing this constraint in future implementations may prove
the most prudent path.
5. Provide the ability for a client to retrieve documents in pieces.
Because retrieval of a portion of a document could be done several ways
with Z39.50-1988, specific assumptions were made to implement this
functionality. Accessing a portion of a document was required for both
retrieval and for relevance feedback.
6. Run over TCP.
The Z39.50-1988 standard was designed to run in the application layer
using the presentation services provided by the Open Systems
Interconnection (OSI) Reference Model. Due to the popularity of TCP/IP
and the Internet, WAIS was designed to run over TCP. Use of Z39.50
over TCP is described in [4].
5. WAIS Implementation of Z39.50-1988
By working with the Z39.50 Implementors Group (ZIG), the WAIS
developers used a recommended subset of Z39.50-1988 and specific
assumptions to fulfill its requirements. Over time, many of these
requirements have then gone into the definition of subsequent versions
of Z39.50. As new requirements become apparent, WAIS will document any
additional assumptions and work with the ZIG in developing extensions.
WAIS supported the Init and Search Facilities of Z39.50-1988. Both
search and retrieval were implemented using the Search Facility, as
described in this section.
Search was initiated by the client with a Search Request APDU
(Application Protocol Data Unit) using a Type-3 query. The query
contained two main fields:
1. The "seed words", or text, typed by the user.
2. A list of document objects, where a document object is a full
document, or portion thereof, to be used in relevance feedback.
Each document object contains a document identifier (Doc-ID) [5],
type, chunk-code, and start and end locations. The Doc-ID and
type specify the location and format, respectively, of the
document. The chuck-code determines the unit of measure for the
start and end locations. Examples of chunk-codes used include
byte, line, paragraph, and full document. If the chunk code is a
full document, the start and end locations are ignored.
A Search Response APDU returned by the server contained a relevance
ranked list of records, or WAIS Citations. A WAIS Citation refers to a
document on the server. Each WAIS Citation contains the following
fields:
1. Headline - a set of words that convey the main idea of the
document.
2. Rank - the numerical score of the document based on its relevance
to the query, normalized to a top score of 1000.
3. List of available formats - e.g. text, postscript, tiff, etc.
4. Doc-ID - the location of the document.
5. Length - the length of the document in bytes.
The number of WAIS Citations returned was limited by the preferred
message size negotiated during the Init.
Retrieval of a document was initiated by the client with a Search
Request APDU using a Type-1 query. The query contained up to four
terms:
1. Term: Doc-ID
Use Attribute: system-control-number code = "un"
Relation Attribute: equal code = "re"
2. Term: the requested document format
Use Attribute: data-type code = "wt"
Relation Attribute: equal code = "re"
3. Term: the start location
Use Attribute: paragraph, line, byte code = "wp", "wl", "wb"
Relation Attribute: greater-than-or-equal code = "ro"
4. Term: the end location
Use Attribute: paragraph, line, byte code = "wp", "wl", "wb"
Relation Attribute: less-than code = "rl"
Because full-text and images were often larger in size than the receive
buffer of the client, clients were designed to optionally retrieve
documents in chunks, specifying the start and end positions of the
chunk in the query. An example of a fully-specified retrieval query
is:
query = ( ( use = "un", relation = "re", term = <Doc-ID> )
AND
( use = "wt", relation = "re", term = postscript )
AND
( use = "wb", relation = "ro", term = 0 )
AND
( use = "wb", relation = "ro", term = 2000 )
)
A retrieval response was issued by the server with a Search Response
APDU. In this case a single record corresponding to the requested
document, or portion thereof, was returned in the specified format.
6. Security Considerations
This RFC raises no security issues.
7. References
[1] National Information Standards Organization (NISO). American
National Standard Z39.50, Information Retrieval Service
Definition and Protocol Specifications for Library Applications,
New Brunswick, NJ, Transaction Publishers; 1988.
[2] ANSI/NISO Z30.50-1992 (version 2) Information Retrieval Service
and Protocol: American National Standard, Information Retrieval
Application Service Definition and Protocol Specification for
Open Systems Interconnection, 1992.
[3] Z39.50 Version 3: Draft 8", October 1993. Maintenance Agency
Reference: Z39.50MA-034.
[4] Internet Draft, "Using the Z39.50 Information Retrieval Protocol
in the Internet Environment", Clifford Lynch, November 1993.
[5] "Document Identifiers, or International Standard Book Numbers
for the Electronic Age", Brewster Kahle, Thinking Machines
Corporation, see URL=<ftp://wais.com/pub/protocol/doc-ids.txt>,
September 1991.
8. Author's Address
Name: Margaret St. Pierre
Affiliation: WAIS Incorporated
Address: 1040 Noel Drive
Menlo Park, California 94025
Phone: (415) 327-WAIS
Fax: (415) 327-6513
EMail: saint@wais.com
Name: Jim Fullton
Affiliation: Clearinghouse for Networked Information
Discovery & Retrieval
Address: 3021 Cornwallis Road
Research Triangle Park, North Carolina 27709-2889
Phone: (919)-248-9247
Fax: (919)-248-1101
EMail: jim.fullton@cnidr.org
Name: Kevin Gamiel
Affiliation: Clearinghouse for Networked Information
Discovery & Retrieval
Address: 3021 Cornwallis Road
Research Triangle Park, North Carolina 27709-2889
Phone: (919)-248-9247
Fax: (919)-248-1101
EMail: kevin.gamiel@cnidr.org
Name: Jonathan Goldman
Affiliation: Thinking Machines Corporation
Address: 1010 El Camino Real, Suite 310
Menlo Park, California 94025
Phone: (415) 329-9300 x229
Fax: (415) 329-9329
EMail: jonathan@think.com
Name: Brewster Kahle
Affiliation: WAIS Incorporated
Address: 1040 Noel Drive
Menlo Park, California 94025
Phone: (415) 327-WAIS
Fax: (415) 327-6513
EMail: brewster@wais.com
Name: John A. Kunze
Affiliation: UC Berkeley
Address: 289 Evans Hall
Berkeley, California 94720
Phone: (510) 642-1530
Fax: (510) 643-5385
EMail: jak@violet.berkeley.edu
Name: Harry Morris
Affiliation: WAIS Incorporated
Address: 1040 Noel Drive
Menlo Park, California 94025
Phone: (415) 327-WAIS
Fax: (415) 327-6513
EMail: morris@wais.com
Name: Francois Schiettecatte
Affiliation: FS Consulting
Address: 435 Highland Avenue
Rochester, New York 14620
Phone: (716) 256-2850
EMail: francois@wais.com